Skip to content

PERF: vectorize _range_from_fields and _assemble_from_unit_mappings#65195

Open
jbrockmendel wants to merge 2 commits intopandas-dev:mainfrom
jbrockmendel:perf-range_from_fields
Open

PERF: vectorize _range_from_fields and _assemble_from_unit_mappings#65195
jbrockmendel wants to merge 2 commits intopandas-dev:mainfrom
jbrockmendel:perf-range_from_fields

Conversation

@jbrockmendel
Copy link
Copy Markdown
Member

Summary

  • Add period_ordinals_from_fields Cython function that converts arrays of date/time fields to period ordinals in a single C-level loop, with optional date validation
  • Vectorize _range_from_fields to call the new Cython function instead of looping in Python-space and appending to a list; vectorize the quarter-to-calendar-month conversion with numpy ops
  • Reuse the same function in _assemble_from_unit_mappings with freq=FR_US to construct datetime64[us] directly from field arrays, avoiding the object-dtype round-trip through ensure_object + array_strptime with format="%Y%m%d"
Benchmark Before After Speedup
PeriodIndex.from_fields (2k monthly) 0.47 ms 0.03 ms 16x
PeriodIndex.from_fields (100k monthly) 22.7 ms 0.92 ms 25x
to_datetime(DataFrame) 100k unique dates 14.9 ms 1.7 ms 9x
to_datetime(DataFrame) 100k repeated dates 1.6 ms 1.7 ms ~1x (parity)

The old to_datetime(DataFrame) path relied on _maybe_cache for repeated values but degraded to ~15ms with unique dates due to per-element str() + strptime. The new path is uniformly fast.

Test plan

  • pandas/tests/indexes/period/test_constructors.py (108 + 3 new tests pass)
  • pandas/tests/tools/test_to_datetime.py (939 + 9 new tests pass)
  • pandas/tests/indexes/period/ (466 tests pass)
  • pandas/tests/arrays/period/ (40 tests pass)
  • mypy, pyright, pre-commit all clean

New tests cover: non-DEC quarter fiscal year, all-6-field hourly periods, empty arrays, leap-year Feb 29 validation, invalid day-of-month (raise + coerce), fractional float coerce, empty DataFrame, UTC with time fields.

🤖 Generated with Claude Code

@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Apr 12, 2026
Add period_ordinals_from_fields Cython function that converts arrays
of year/month/day/hour/minute/second fields to period ordinals in a
single C-level loop, replacing the Python-space list-append loop in
_range_from_fields.

Reuse the same function in to_datetime's _assemble_from_unit_mappings
with freq=FR_US to construct datetime64[us] directly from field arrays,
avoiding the object-dtype round-trip through ensure_object + array_strptime
with format="%Y%m%d".

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@jbrockmendel jbrockmendel force-pushed the perf-range_from_fields branch from b9f6593 to 3dba4f6 Compare April 16, 2026 20:16
Replace nonlocal-based fractional tracking with a tuple return; fold
the six per-field conversion calls into a single loop over field_spec
that also drives the NaN-mask default-fill step.

Replace the "rerun through %Y%m%d strptime just to borrow its error
message" fallback with a direct ValueError naming the offending column.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@jbrockmendel jbrockmendel marked this pull request as ready for review April 17, 2026 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant